datafusion-proto 36.0.0

Protobuf serialization of DataFusion logical plan expressions
docs.rs failed to build datafusion-proto-36.0.0
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Visit the last successful build: datafusion-proto-39.0.0

Apache Arrow DataFusion Proto

Apache Arrow DataFusion is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.

This crate provides support format for serializing and deserializing the following structures to and from bytes:

  1. LogicalPlan's (including Expr),
  2. ExecutionPlans (including PhysiscalExpr)

This format can be useful for sending plans over the network, for example when building a distributed query engine.

Internally, this crate is implemented by converting the plans to protocol buffers using prost.

See Also

The binary format created by this crate supports the full range of DataFusion plans, but is DataFusion specific. See datafusion-substrait which can encode many DataFusion plans using the substrait.io standard.

Examples

Serializing Expressions

Based on examples/expr_serde.rs

use datafusion_common::Result;
use datafusion_expr::{col, lit, Expr};
use datafusion_proto::bytes::Serializeable;

fn main() -> Result<()> {
    // Create a new `Expr` a < 32
    let expr = col("a").lt(lit(5i32));

    // Convert it to an opaque form
    let bytes = expr.to_bytes()?;

    // Decode bytes from somewhere (over network, etc.)
    let decoded_expr = Expr::from_bytes(&bytes)?;
    assert_eq!(expr, decoded_expr);
    Ok(())
}

Serializing Logical Plans

Based on examples/logical_plan_serde.rs

use datafusion::prelude::*;
use datafusion_common::Result;
use datafusion_proto::bytes::{logical_plan_from_bytes, logical_plan_to_bytes};

#[tokio::main]
async fn main() -> Result<()> {
    let ctx = SessionContext::new();
    ctx.register_csv("t1", "tests/testdata/test.csv", CsvReadOptions::default())
        .await
        ?;
    let plan = ctx.table("t1").await?.into_optimized_plan()?;
    let bytes = logical_plan_to_bytes(&plan)?;
    let logical_round_trip = logical_plan_from_bytes(&bytes, &ctx)?;
    assert_eq!(format!("{:?}", plan), format!("{:?}", logical_round_trip));
    Ok(())
}

Serializing Physical Plans

Based on examples/physical_plan_serde.rs

use datafusion::prelude::*;
use datafusion_common::Result;
use datafusion_proto::bytes::{physical_plan_from_bytes,physical_plan_to_bytes};

#[tokio::main]
async fn main() -> Result<()> {
    let ctx = SessionContext::new();
    ctx.register_csv("t1", "tests/testdata/test.csv", CsvReadOptions::default())
        .await
        ?;
    let logical_plan = ctx.table("t1").await?.into_optimized_plan()?;
    let physical_plan = ctx.create_physical_plan(&logical_plan).await?;
    let bytes = physical_plan_to_bytes(physical_plan.clone())?;
    let physical_round_trip = physical_plan_from_bytes(&bytes, &ctx)?;
    assert_eq!(format!("{:?}", physical_plan), format!("{:?}", physical_round_trip));
    Ok(())
}

Generated Code

The prost/tonic code can be generated by running, which in turn invokes the Rust binary located in gen

This is necessary after modifying the protobuf definitions or altering the dependencies of gen, and requires a valid installation of protoc.

./regen.sh